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Abstract 

^Mobile robot navigation using visual sensors re- 
quires that a robot be able to detect landmarks and 
obtain pose information from a camera image. This 
paper presents a vision system for finding man made 
markers of known size and calculating the pose of 
these markers. The algorithm detects and identifies 
the markers using a weighted pattern matching tem- 
plate. Geometric constraints are then used to calcu- 
late the position of the markers relative to the robot. 
The selection of geometric constraints comes from the 
typical pose of most man made signs; such as the 
sign standing vertical and the dimensions of known 
size. This system has been tested successfully on a 
wide range of real images. Marker detection is reli- 
able, even in cluttered environments, and under cer- 
tain marker orientations, estimation of the orientation 
has proven accurate to within 2 degrees, and distance 
estimation to within 0.3 meters. 

Task description 

Humans are very dependent on their sense of sight 
for navigation. People use both natural and man- 
made landmarks to help them determine where they 
are and which way they want to go next. What hu- 
mans can do with the greatest of ease, however, can 
be very difficult for robots. Mobile robot navigation 
using visual sensors typically requires that the robot 
be able to obtain pose information from a camera im- 
age. This task often includes recognizing markers or 
other known objects in the image and calculating the 
object pose from the size and appearance. 

There are several tasks that a robot navigating by 
vision must deal with; the robot must to be able to ex- 
tract markers from a complex environment; the robot 
has to recognize these markers from many different 
points of view; and the robot must determine, from 
it’s view of the marker, the pose (3D position and ori- 
entation) of the marker. In addition, for all practical 
purposes, the robot should be able to perform all of 
the above tasks relatively fast (less than a few seconds 
in most cases). 

This paper describes a vision system that was im- 
plemented for the AAAI 1993 Robot Competition in 
Washington D. C. on July 11-16, 1993. All vision 
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processing was performed onboard the robot using a 
80486 PC DOS based computer. A complete descrip- 
tion of the design of the University of Michigan entry 
can be found in [1]. 

The vision system is divided into a marker ex- 
traction and identification step, and a pose estima- 
tion step. Marker extraction finds predefined mark- 
ers (black Vs and ’+’s on a white background) in 
the environment and dete rmines their pose relative 
to the robot. Thus, a robot using this system should 
be able to navigate autonomously using visual sensors 
in a semi-constrained environment . The required ge- 
ometric constraint s are: the marker must stand verti- 
cal; the marker and camera contain no roll; the focal 
length of the camera and the camera’s location rela- 
tive to the robot are known; the robot is oriented in 
the plane perpendicular to the marker; and the width 
and height of the marker are known. Though these 
constraints may seem restrictive, they are typical of 
most man made signs such as traffic signs and office 
door markers. 

Marker detection 

The marker detection phase is composed of two 
main routines: the connected components routine 

and the marker identification routine. The detection 
phase must be both fast and accurate for the system 
to be useful for most real world tasks. 

To maximize speed, we make only one pass through 
the entire image. During the pass, the image is 
thresholded and connected components are found and 
labeled. One pixel components are ignored and not 
labeled. Size thresholding then filters out most of 
the non-marker components. Only one pass is made 
through all possible connected components. Figure 1 
shows sample output from this stage. The possible 
markers are outlined with a bounding box. 

To identify or reject the remaining markers, a 
weighted pattern matching template is used. An nxn 
template matrix is created fo r each marker (see Fig- 
ure 2). Increasing n increases the resolution of the 
template, but also increase s the process time. We 
found n = 7 to be a good compromise. This weighted 
template indicates which are as are expected to be 
black and which ones white. The weights for our 
matrix are currently determined by trial and error, 
but we could easily replace these with machine gener- 
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Sample marker 


Certainty x = ^ - 0 .9583 

ErEcFrcl 96 



Figure 1: The first image is a typical input im- 
age. The second image shows the mark- 
ers that are detected by the connected 
components routine. These markers will 
be identified as x, +, or neither. 


Er Ec /p( r . c ) 
Er Ec \Prc\ 


Certainty, = V = = 0.3571 


50 

140 


Certainty = max(Certainty X) Certainty y (1) 


f*(r,c) - | 

/ P ( r . c ) = | 


|x rc | if correct color 
0 otherwise 

|x rc | if correct color 
0 otherwise 


Figure 3: Sample marker with calculated x and + 
certainty values. “b” indicates a black 
pixel; V indicates a white pixel, x 
refers to the x template; p refers to the 
T template, r counts rows; c counts 
columns. For this example, the program 
is 95.8% certain that the sample marker 
is an x and 35.7% certain that it is a -f. 
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+ template 


Figure 2: Weighted pattern templates for the x 
and the -f markers. Positive values in- 
dicate expected black areas; negative ar- 
eas are expected to be white. Certainty 
increases with magnitude. 


ated weights if a learning program were implemented. 
The marker template which a component most resem- 
bles is selected as the “guess” for that component. 
The program generates a certainty measure with each 
guess (see Figure 3) and uses this measure to accept 
or reject the guess. 

Each marker can have one or more templates. The 
additional templates may be used to improve marker 
recognition from other views. 

Two types of heuristic information is also used in 
identifying the markers. Some heuristics were known 
before the program was written. Knowing that all -f’s 
have a vertical line down the center of the bounding 
box, no matter what the robot’s relative position, has 
strongly emphasized the importance of the center line 
in the template. Other heuristics were not learned or 
incorporated until after the program had been tested. 
Diagonal lines often scored high enough certainty val- 
ues to be considered x‘s. Adding a specific test to ver- 
ify that each possible x is not a diagonal line solved 
this problem. 

Pose estimation 


The three dimensional position and orientation 
(pose) of the markers is also determined. Such in- 
formation is useful for performing further analysis. 
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One possible application of the pose estimation al- 
gorithm is the detection of road signs. Once a sign’s 
pose is calculated the pixels corresponding to the sign 
can be mapped to an orthographic projection. Since 
virtually all character recognition algorithms assume 
an orthographic projection, this would allow for much 
improved character recognition. 

For the robot competition, the pose of the mark- 
ers also represents the pose of the box to which the 
marker is attached. One phase of the competition re- 
quires the robot to autonomously move the box from 
one location to another. The marker pose is used to 
guide the robot to the box such that the box can be 
pushed to the appropriate location. 

Geometric constraints are used to calculate the po- 
sition of the markers relative to the robot. First, the 
marker is expected to be mounted on a planar sur- 
face and that the four corners of the marker are de- 
tected from the low level image processing (marker 
extraction and identification). The markers dimen- 
sions are also know in advance. Second, the marker 
is standing vertical. As mentioned before, this is not 
an unreasonable constraint as many man made signs 
stand vertical. Finally, the calibration parameters of 
the camera are known, including orientation of the 
camera relative to the robot and the camera s focal 
length. Also, there should be minimal* camera roll 
(rotation about the Z axis). 

These geometric constraints form a set of 24 equa- 
tions in 18 unknowns defining the position of the four 
corners of the markers. This provides an overcon- 
strained set of equations which is solved using the 
method of least squares. The final result are the 3D 
position of the four corners of the markers. For the 
given application, the orientation of the markers and 
the distance to the center of the marker are calculated 
from the four 3D positions. These two values are used 
by the robot to navigate to the markers so that more 
accurate identification and pose calculations can be 
made. 

Utilizing Geometric Constraints 

Figure 4 depicts the geometry of the imaging pro- 
cess with the bounding box of a ’+’ marker being 
mapped to the image plane. Both the width (w) and 
height ( h ) of the markers are known. The three di- 
mensional unit direction vectors nl, n2, n3, and n4, 
which are directed from the known focal center of the 
camera F towards the unknown marker position vec- 
tors PI, P2, P3, and P4, are calculated. This calcu- 
lation is feasible given the position of the focal center 
of the camera F, and given the four sensor plane 2D 
position vectors pi, p3, and p4. These 2D vectors 
correspond to the mapping of the corners of the mark- 
ers onto the sensor plane. Due to the imaging process, 
distances dl, d2, d3, and d4 are unknown (where dn 

‘Current experimentation indicate that both a marker 
tilt and marker (or camera) tilt of up to 10 degrees do not 
significantly effect the calculation of the position of the 
marker. In addition, the effects on the orientation also 
seem negligable relative to other errors. Further testing is 
being performed. 



Figure 4: Mapping of objects onto the image plane. 



Figure 5: Locations of coordinate frames 


is the distance in 3D space from pn to Pn). Figure 5 
shows the coordinate frame assigned to the camera’s 
sensor plane and its relation to camera’s 3D coor- 
dinate frame \Jr c , the image coordinate frame and 
the robot coordinate frame $f r . 

It is assumed that the camera focal length is known 
and that the pose of the camera relative to the robot 
is also know. Then all points are transformed to the 
robot coordinate frame ^ r . This results in the follow- 
ing equations of known vectors: 


rTl = [ — pljc , ”Plyi /] (2) 

n2 = [-p2 x ,-p2 y J] (3) 

n3 = [— p3 r , — p3 y , /] (4) 

n4 = [-p4 X) — pA y , /]. (5) 

The vector equations with unknowns are: 

PI = dl x nl (6) 

P2 = d2 x n2 (7) 

P3 = d3 x n3 (8) 

P4 = d4 x n4. (9) 


In addition the following constraint equations arise 
given the marker is standing vertical and that the 
camera and marker have no roll (rotation about the 
Z axis). Here dl, d2, d3, and d4 are the distances from 
the camera focal center to the unknown 3D points PI, 
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P 2, P3, and P4 . 


dl x nl x + w x nw x 

(10) 

d\ x nl y + w x nw y 

(11) 

d4 x n4 x + wx nw x 

(12) 

d4 x n4 y + w x nw y 

(13) 

P4 Z = dl x nl* + h 

(14) 

P3 Z = d2 x n2 z + h 

(15) 

d\ x nl* = d2 x n2 z 

(16) 

d4 x n4z = d3 x n3z 

(17) 

P4 X = Pl x 

(18) 

P4y = Ply 

(19) 

PZ X = P2 X 

(20) 

P3 y = P2y. 

(21) 


These equations can be expressed as an overcon- 
strained system of linear equations with the above 24 
equations and the 18 unknowns of d\ y d2, d3, d4, PI, 
P2, P3, P4, and riiv. The two dimensional unit vec- 
tor riw has an x and y component. nw x corresponds 
to the x component of the vector pointing from PI 
to P2, and nw y corresponds to the y component of 
this vector. There is no z component to riw since the 
markers, and the camera, are assumed to have no roll._ 

Equations 2 thru 21 result in the matrix equation 


y = Ax, (22) 

where y is the 24 element known vector 

y=( 0, 0,0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0,0, 0, (23) 

0,0,0, i/,0, 0,0,0) 

and x is the 18 element unknown vector 

z=(Pl„ Pis, PZr, P4*’ Pl y P2 V’ e3 y ,P4 y , (24) 
Pl z ,P2 l ,P3 z , P4 t ,dl,d2, cf3, d4, nwx, nwy) 


and the matrix A is the following: 


"10 0 
0 1 0 
0 0 1 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 -1 1 
0 0 0 
1 0 0 
| 0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 1 0 
0 0 0 
0 0 1 
0 0 0 
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0 0 10 
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0 0 0 1 


0 0000 - 
0 0 0 00 
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0 0000 
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0 0000 

0 0 0 0 0 

1 0000 

0 1000 

0 0 100 

0 0 0 1 0 

0 0 0 0 1 

0 0 0 0 0 

0 0 0 00 

0 0 0 00 
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0 0 0 0 0 
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Results 

The accuracy of the pose estimation algorithm is 
measured by the error between the estimated and true 
marker distance and orientation. Robustness refers to 
the program’s ability to detect markers and make rea- 
sonable pose estimations in complex situations such 
as cluttered images, tilted camera, uneven floor, etc. 
A set of experiments have been performed which test 
these measures. 

The testing of this vision system has produced 
promising results. Marker extraction and identifica- 
tion is very accurate, even in cluttered images. Mark- 
ers can be extracted at orientations of up to 60 de- 
grees. Pose estimation is possible in the range of 
one to seven meters. Distance can be determined to 
within .2 meters when the marker is at an orientation 
of 50 degrees. Marker orientation can be as accurate 
as 1 degree; the ground truth measurements of orien- 
tation is approximately 1 degree, so any error at this 
resolution could be a factor of either the vision sys- 
tem or the ground truth measurements of the marker 
orientation. These results were obtained on low^ res- 
olution images of 315 by 200 pixels. Figure 6 shows 
two sample images with the calculated marker pose 
projected onto the images. 

The system should be able to extract only and all 
markers in an image. If a tradeoff must be made, 
then it is prefered to that non-markers be identified 
as markers. The robot can then approach false mark- 
ers and perform further analysis to determine that 
indeed this marker is not a false positive. To make 
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Figure 6: Two sample images with the calculated 
marker pose projected on top. 


such an analysis more tractable, the vision system 
should output a confidence value with each marker 
sighting, which would be used by the robot to deter- 
mine which markers need further analysis. With each 
classification, the marker detection algorithm gener- 
ates a certainty value as given in equation 1 , and the 
pose estimation algorithm generates <5, the residual 
from the least squares fit as 

b = Ax - y (25) 

These two values, residual and certainty , are avail- 
able to the robot to help determine how to accept the 
marker and its pose. 

The experiments involved processing of 42 images, 
each having two to four markers. Only once did the 
marker detection step identify a non-marker object as 
a marker (a false positive). The program only missed 
existing markers when oriented at angles greater than 
50 degrees and often detects markers up to 70 degrees. 

The original purpose of the marker size threshold 
was to eliminate obvious non-marker components as 
soon as possible and reduce the number of connected 
components that are processed by the marker iden- 
tification routine. If the user can set the threshold 
to limit the size of the markers to a small range, 
fewer extraneous components are then processed by 
the marker identification routine, reducing the chance 
of false positives. Unfortunately, a small range also 
limits the distance at which markers can be recog- 
nized. During testing, it was found that a narrow size 


Figure 7: Plot of the error in calculated distance as 
actual distance increases and with zero 
box orientation. 


threshold was not crucial for accurate identification. 
Marker sizes in the distance images ranged from about 
50 pixels at seven meters to over 1000 pixels at one 
meter. Even with such a wide size range, the program 
returned a false positive only once, while successfully 
finding over 100 markers in 42 test images. 

Figures 7, 8, and 9 are plots of some of the experi- 
ments. The first plot displays the calculated distance 
error as a function of distance to the marker. All tests 
resulted in an error of less than 0.4 meters and over 
half being less than 0.2 meters. As expected, the re- 
sults show that the error generally increases as the 
distance from the object increases. The main excep- 
tion being the two data points around 6 meters that 
have a very small error. More data points are needed 
to determine if this is the not due to some unfore- 
seen anomaly of the algorithm, or just chance, as we 
suspect it is. 

Figure 8 displays the results from the experiment 
to test the distance accuracy as a function of marker 
orientation. The marker detection algorithm can not 
reliably segment markers at orientations above 60 de- 
grees, hence the orientation plots only extend from 
zero to 60 degrees. An orientation of 0 degrees cor- 
responds to the marker being perpendicular to the 
imaging plane. All these tests were from a distance of 
2.16 meters. The distance error is within 0.13 meters 
with a marker orientation between zero degrees and 
50 degrees. 

Figure 9 represents the experiment to test the ori- 
entation calculation accuracy as a function of marker 
orientation. All the tests were from a distance of 
2.16 meters again. This plot displays the interest- 
ing feature that the error is minimal between 30 and 
60 degrees. Also, the error increases from 30 degrees 
back to 0 degrees of marker orientation. This effect 
is due to the perspective transformation; when ob- 
jects are perpendicular to the imaging plane, small 
perturbations in the objects orientation make even 
smaller changes in the view as mapped to the imaging 
plane. The small perturbation effects increase as the 
angle increases (object becoming less perpendicular 
to the imaging plane). This effect causes fairly large 
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Figure 8: Plot of the error in calculated distance as 
box orientation changes and at constant 
distance of 2.16 meters. 


bo* twW ori#nt»tofi ifw from 2.16 



Figure 9: Plot of the error in calculated box ori- 
entation as actual box orientation in- 
creases. 


changes in the orientation of the markers (when the 
object is almost perpendicular to the imaging plane) 
to account for small changes in the mapping of the 
marker onto the image plane. Hence, small changes 
in marker orientation go unnoticed by the algorithm 
when the object orientation is much less than 30 de- 
grees. Perhaps more appropriately, small errors in the 
pixel locations of the four corners of the marker result 
in large changes in the computed object orientation 
when orientations are less than 30 degrees. Errors 
in the marker detection algorithm become more cru- 
cial under small orientation angles, with our experi- 
mental results showing this to be true as well. This 
marker orientation sensitivity can be shown analyt- 
ically as well. Figure 10 shows a two dimensional 
representation of the problem. For our experiments, 
the variables /, D, and L are known and have values 
of 0.0085 meters, 2.16 meters and 0.23 meters respec- 
tively. / corresponds to the camera focal lenght, D 
the distance from the camera to the marker, and L 
the width of the marker. The following equations are 
basic geometry equations from Figure 10: 

L p = Dl/f (26) 

<9 = 180 — atan(f/l) (27) 



Figure 10: The two dimensional representation of 
the orientation of the marker relative to 
the imaging plane. 


ft — arcsin((Dl/(fL))sin(di)) (28) 

a = 180 — 0 — /3. (29) 

Now solving for a as a function of /, 

a(/) = arctan(y) — arcsin (DL~ l —t= ), (30) 

1 v 1 1 + £ 

and its derivative with respect to / is 


= -//-* (l + £) -D} 2 * (31) 



Figure 11 represents the plot of a(/) for values of / 
from zero to — gp, and Figure 12 is a plot of dot jp - for 

the same range of /. Notice the sharp knee in at 
/ « 0.0007. This shows that for l < 0.0007 meters the 
magnitude of the rate of change of a with respect to / 
is fairly constant and small. However, for / > 0.0007 
meters, the magnitude of this rate of change increases 
very rapidly, meaning that small perturbations in the 
length / (the measured width of the marker) result 
in large changes in the marker orientation. When 
the marker detection process introduces small spatial 
measurement errors, for example, due to quantization 
of the image and the due to the marker segmentation 
process itself, then the resulting estimated orienta- 
tion errors may be very large when / > 0.0007 meters. 
This corresponds to the experimental results as shown 
in Figure 9. Also, from the plots in Figure 12 and 11, 
the location of the knee at 0.0007 meters corresponds 
to an angle of approximately 0.6 radians or 34 de- 
grees. This in turn, corresponds to the experimental 
findings that the orientation error increases for val- 
ues of marker orientation less than approximately 30 
degrees. 

Conclusions 

Results from this project indicate that it is possible 
to obtain useful pose information from a camera im- 
age in real time on a general purpose computer such 
as a 80486 based PC. Additional tests on the sensitiv- 
ity of pose estimation to various parameters such as 
focal length value perturbations and marker size are 
planned. In addition, we will be studying the trade- 
offs between process time (i.e. image resolution) and 
accuracy. 
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Figure 11: Plot of a(l ). 



Figure 12: Plot of ^01. 
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